Bandar Abbas
PerHalluEval: Persian Hallucination Evaluation Benchmark for Large Language Models
Hosseini, Mohammad, Hosseini, Kimia, Bali, Shayan, Zanjani, Zahra, Momtazi, Saeedeh
Hallucination is a persistent issue affecting all large language Models (LLMs), particularly within low-resource languages such as Persian. PerHalluEval (Persian Hallucination Evaluation) is the first dynamic hallucination evaluation benchmark tailored for the Persian language. Our benchmark leverages a three-stage LLM-driven pipeline, augmented with human validation, to generate plausible answers and summaries regarding QA and summarization tasks, focusing on detecting extrinsic and intrinsic hallucinations. Moreover, we used the log probabilities of generated tokens to select the most believable hallucinated instances. In addition, we engaged human annotators to highlight Persian-specific contexts in the QA dataset in order to evaluate LLMs' performance on content specifically related to Persian culture. Our evaluation of 12 LLMs, including open- and closed-source models using PerHalluEval, revealed that the models generally struggle in detecting hallucinated Persian text. We showed that providing external knowledge, i.e., the original document for the summarization task, could mitigate hallucination partially. Furthermore, there was no significant difference in terms of hallucination when comparing LLMs specifically trained for Persian with others.
- Asia > Indonesia > Bali (0.04)
- Indian Ocean > Arabian Gulf (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (9 more...)
- Research Report > Experimental Study > Negative Result (0.68)
- Research Report > New Finding (0.66)
Automatic coherence-driven inference on arguments
CDI also offers a plausible approach for automatically making sense of competing arguments in a way that accords with the features enumerated here. This paper is part of an argument that it is now feasible to computationally instantiate a reasonable approximation of a coherence theory of truth [64]: the recent benchmark [12] provides additional quantitative evidence in this direction. By "hard-coding" acceptance of conclusively established propositions, this theory can furthermore be anchored in a correspondence theory of truth [65]. In other words, coherence computations can be required to incorporate privileged information that also coheres with observed reality. While it is easy to imagine attempts to try the same thing with privileged information that does not cohere with observed reality, lies cannot persist when they can easily be unraveled. Even with flawless technology (which this will not be), obstacles will be manifold. For example, in a pluralistic society, legal coherence may actually require sacrificing fairness in some ways [66]. Ultimately, people must decide matters for themselves. It is only reasonable to hope that technology can serve as a reliable tool to help people make their decisions more coherent.
- Asia > Russia (0.15)
- North America > United States > Kansas (0.06)
- Europe > Russia (0.05)
- (7 more...)
- Education (1.00)
- Law > Government & the Courts (0.94)
- Government > Regional Government > North America Government > United States Government (0.93)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Neurosymbolic artificial intelligence via large language models and coherence-driven inference
Huntsman, Steve, Thomas, Jewell
We devise an algorithm to generate sets of propositions that objectively instantiate graphs that support coherence-driven inference. We then benchmark the ability of large language models (LLMs) to reconstruct coherence graphs from (a straightforward transformation of) propositions expressed in natural language, with promising results from a single prompt to models optimized for reasoning. Combining coherence-driven inference with consistency evaluations by neural models may advance the state of the art in machine cognition.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Law (0.67)
Artificial Intelligence: Too Fragile to Fight?
You can become utterly dependent on a new glamorous technology, be it cyber-space, artificial intelligence. . . But does it create a potential achilles heel? Artificial intelligence (AI) has become the technical focal point for advancing naval and Department of Defense (DoD) capabilities. Secretary of the Navy Carlos Del Toro listed AI first among his priorities for innovating U.S. naval forces. Chief of Naval Operations Admiral Michael Gilday listed it as his top priority during his Senate confirmation hearing.2
- North America > United States (1.00)
- Europe > France (0.04)
- Asia > Middle East > Iran > Hormozgan Province > Bandar Abbas (0.04)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
How We Use Machine Learning for Targeted Location Monitoring
For a while now the DigitalGlobe GBDX team has been running machine learning-based object detection at a significant, continental scale. Each time we add a new model to GBDX we kick the tires and do some comparisons to discover advantages or disadvantages over existing capabilities. We keep our customer use cases in mind, which typically boil down to "monitoring and change" or "pattern of life" activities. Some things we monitor with the models we have today include detecting changes or activity in a parking lot or port. With that in mind we wanted to do a "state of the union" or "state of the map" about the current state of machine learning on satellite imagery.
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.06)
- North America > United States > Alabama > Montgomery County > Montgomery (0.05)
- Asia > Middle East > Iran > Hormozgan Province > Bandar Abbas (0.05)
- Information Technology (0.75)
- Aerospace & Defense (0.64)